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The patterns of life exhibited by large populations have 
been described and modeled both as a basic science exer- 
cise and for a range of applied goals such as reducing auto- 
motive congestion, improving disaster response, and even 
predicting the location of individuals. However, these stud- 
ies previously had limited access to conversation content, 
rendering changes in expression as a function of movement 
invisible. In addition, they typically use the communication 
between a mobile phone and its nearest antenna tower to 
infer position, Umiting the spatial resolution of the data to 
the geographical region serviced by each cellphone tower. 
We use a collection of 37 million geolocated tweets to char- 
acterize the movement patterns of 180,000 individuals, tak- 
ing advantage of several orders of magnitude of increased 
spatial accuracy relative to previous work. Employing the 
recently developed sentiment analysis instrument known 
as the hedonometer, we characterize changes in word usage 
as a function of movement, and find that expressed happi- 
ness increases logarithmically with distance from an indi- 
vidual's average location. 

A proper characterization of human mobility patterns [1- 
16] is an essential component in the development of models 
of urban planning [17], traffic forecasting [18], and the spread 
of diseases [19-21]. In the modern communication era, pat- 
terns of human movement have been revealed at an increas- 
ingly higher resolution in both space and time, with mobile 
phone data in particular complementing existing survey-based 
investigations. As is the case with each new instrument mea- 
suring macroscale sociotechnical phenomena, the task has be- 
come one of understanding what discernible patterns exist, and 
what meaning can be derived from those patterns [3, 22-24]. 

Scientists working to understand mobility have employed a 
diverse set of methodologies. Brockman et al. [7] used the cir- 
culation of nearly 1/2 million U.S. dollar bills whose locations 
were submitted by over 1 milUon visitors to a website [25] to 
demonstrate that bank note trajectories are superdiffusive in 
space and subdiffusive in time, i.e. moving farther and more 
frequently than expected. 

Gonzalez et al. [2] used 6 months of mobile phone data from 
100,000 individuals to show that human trajectories are regular 
in space and time, with each individual having a high probabil- 
ity of returning to a few preferred locations according to Zipf 's 
law. Combining phone communication data with measures of 
community economic prosperity. Eagle et al. [3] showed that 
the diversity of contacts in an individual's social network is 
strongly correlated to the potential for economic development 
exhibited by their community. 

Exemplifying recent work to characterize sentiment with 
social network communications, Mitchell et al. [26] combined 



traditional survey data (e.g., Gallup) with millions of tweets 
to correlate word usage with the demographic characteristics 
of U.S. urban areas. Expressed happiness was shown, for ex- 
ample, to correlate strongly with percentage of the population 
married, and anti-correlate with obesity. Words such as 'Mc- 
Donalds' and 'hungry' appeared far more frequently in obese 
cities, suggesting their instrument could be used to provide 
real-time feedback on social health programs such as the pro- 
posed ban on the sale of large sodas in New York City in 2013. 

In what follows, we characterize the pattern of life of over 
180,000 individuals in the U.S. using messages sent via the 
social networking service Twitter, and employ our text-based 
hedonometer [27] to characterize sentiment as a function of 
movement. In the calendar year 2011, we collected roughly 
4 billion messages through Twitter's gardenhose feed, repre- 
senting a random 10% of all status updates posted during this 
period. 

Along with an abundance of other metadata, location in- 
formation typically accompanies each message, resulting from 
one of three mechanisms by which individuals can report their 
location when updating their status. First, when an individual 
registers their account with Twitter, they are presented with 
the opportunity to report their location in a free text box. This 
region will be displayed in their user profile (e.g. 'NYC or 
'over the rainbow'). The metadata accompanying each tweet 
sent by the individual contains this self -reported location. Sec- 
ond, individuals submitting a message through a web browser 
can choose to tag their message with a 'place' chosen from a 
drop-down menu, where the first option provided is typically 
the city within which the computer's IP address is found. For 
the purposes of accuracy, we have chosen to ignore each of 
these two mechanisms for reporting position when attempting 
to assign each tweet a geographical location, and focus instead 
on messages located via a third mechanism, namely the Global 
Positioning System (GPS). 

Individuals using an app provided by a mobile device may 
opt-in to geolocate their message, in which case the exact lat- 
itude and longitude of the mobile phone is reported. The ac- 
curacy of this information is governed by the precision of the 
GPS instrument embedded in the phone, which can vary de- 
pending on the surrounding topography. As a result of these 
factors, we are able to approximately place each geolocated 
message inside a 10 meter circle on the surface of the Earth, 
within which the tweet was sent. Approximately 1% of the 
status updates received through the gardenhose feed are ge- 
olocated, resulting in a total of 37 million messages, collec- 
tively representing more than 180,000 English-speaking peo- 
ple worldwide. Fig. 1 is representative of the geospatial reso- 
lution of the data. 
Results 

Following Gonzalez et al. [2], we examine the shape of hu- 
man mobility using radius of gyration, hereafter gyradius, as 
a measure of the linear size occupied by an individual's trajec- 
tory. In Fig. 2, we investigate the geographical distribution of 
movement in four urban areas by plotting a dot for each tweet, 
colored by the gyradius of its author. Clockwise from the top 
left, cities are displayed in order of their apparent aggregate 
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Figure 2: (Color online) The gyradius, calculated for each individual, is shown for each tweet authored in four example cities. 
Reflecting the pattern of urban life, we find messages authored by large radius individuals to be more likely to appear in the 
main downtown area of each city, while messages authored by small radius individuals tend to appear outside of the main urban 
area. Histograms of gyradii for each city are shown in Fig. SI, along with tweet locations colored by distance from expected 
location (Fig. S2). Note that higher resolution versions of the four panels above can be found online [28]. 



gyradius, with New York City seemingly exhibiting a smaller 
radius than the San Francisco Bay Area. Reflecting the pattern 
of urban life, we find messages authored by large radius indi- 
viduals to be more likely to appear in the main downtown area 
of each city, while messages authored by small radius individ- 
uals tend to appear in less densely populated areas. For ex- 
ample, in Chicago, many individuals writing from downtown 
exhibit an order of magnitude greater radius than individuals 
posting in areas outside of the city. 

In the greater Los Angeles area, we see several clusters of 
individuals with larger radius in downtown Los Angeles, as 
well as Long Beach, Santa Monica, and Disneyland in Ana- 
heim, while less densely populated areas are seen as smaller 
clusters exhibiting much smaller radii. The San Francisco Bay 
Area is clearly revealed by individuals with large radius, most 
notably in San Francisco, and somewhat less so in Oakland 
and San Jose. Outside of these cities, there are many suburban 
areas revealed by individuals with large radius, e.g. Palo Alto. 
Tweets appearing in less densely populated Bay Area locations 
are far more likely to be authored by large radius individu- 
als than those appearing in lower population areas elsewhere. 
This observation surely reflects the socio-economic and demo- 
graphic characteristics of individuals using Twitter in the Bay 
Area, where the social network service was founded. Addi- 
tionally, it could reflect the presence of tourists who will typ- 
ically have a larger radius than someone who lives and works 
in the Bay Area. 

The main observation apparent in Fig. 2, namely that in- 
dividuals who move a lot tend to appear in areas of large 
population density, is somewhat counterintuitive. Given the 
apparent economies of scale offered by living in a densely 
populated area, one might expect to observe the inverse rela- 
tionship, namely that people living in less densely populated 
areas travel further, by necessity, to their place of employ- 
ment or grocery store, for example. Of course, these individ- 
uals with large radius could be tourists, or they could have 
a long commute. Looking at each point colored instead by 
distance from expected location (Fig. S2)) we still see more 
exaggerated segregation, with non-natives appearing predomi- 
nantly in cities, and native individuals tweeting in the suburbs. 
Looking at 500 cities in the U.S., we find a moderate corre- 
lation between the mean gyradius and city land area (Pearson 
p = 0.24, p^2x 10"^); Fig. S3 and Table SI show the top 
and bottom cities with respect to gyradii. 

To investigate the shape of human mobility, we normalize 
each individual's trajectory to a common reference frame (see 
Methods). In Fig. 3, we plot a heat map of the probability 
density function of the normalized locations of all individuals. 
For the purposes of the discussion, we will refer to deviations 
from an individual's expected location in the normalized refer- 
ence frame as occurring in the directions north, south, east, and 
west. Several features of the map reveal interesting patterns of 
movement. First, the overall west-to-east teardrop shape of the 
contours demonstrates that people travel predominantly along 
their principle axis, namely heading west from the origin along 
y/Oy = 0, with deviations in the orthogonal direction becoming 
shorter and less frequent as they move farther away from the 



origm. 

Second, the appearance of two spatially distinct yellow re- 
gions separated by a less populated green region suggests that 
people spend the vast majority of their time near two loca- 
tions. We refer to these locations as the work and home habi- 
tats [8], where the home habitat is centered on the dark red 
region roughly 1 standard deviation east of the origin, and the 
work habitat is centered approximately 2 standard deviations 
west of the origin. These locations highlight the bimodal dis- 
tribution of principal axis corridor messages (Fig. 4A). 

Finally, a clear asymmetry is observed about the x/Ox = 
axis indicating the increasingly isotropic variation in move- 
ment surrounding the home habitat, as compared to the work 
habitat. We interpret this to be a reflection of the tendency to 
be more familiar with the surroundings of one's home, and to 
explore these surroundings in a more social context (Fig. 4B). 
The symmetry observed when reflecting about the y/Oy = 0- 
axis is strong, demonstrating the remarkable consistency of the 
movement patterns revealed by the data. 

In an effort to characterize the temporal and spatial struc- 
ture observed in Fig. 3, in Fig. 5 we examine locations fre- 
quently visited by the most prolific members of our data set, 
namely the roughly 300 individuals for whom we received at 
least 800 geolocated messages. We suspect that these individ- 
uals enabled the geolocating feature to be on by default for 
all messages, as implied by the roughly (9(10'*) geolocated 
messages suggested by the gardenhose rate. The main fig- 
ure shows the probability of tweeting from each habitat, with 
habitats ordered by rank, for each individual [8]. We find that 
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P{H- )°^R{Hj )^ which is approximately a Zipf distribu- 
tion [29]. This finding indicates that regardless of the num- 
ber of tweet habitats for a given individual, the majority of 
their messaging activity occurs in one of only a few habitats, 
with the probability decaying at a predictable rate. If the decay 
were Zipfian, an individual is approximately n-times as likely 
to tweet from their mode location than from their rank n loca- 
tion. With our slope being steeper, these probabilities fall at 
a faster rate with rank. Note that fitting the power law model 
to the leading 10 habitats, using only individuals who have at 
least 10 habitats, we also get a slope of — 1 .3. 

For roughly 95% of these individuals, each tweet has a 
greater than 10% chance of being authored from their mode lo- 
cation (Fig. 5B). Fig. 5C demonstrates each individual's like- 
lihood of authoring messages from their mode location (black 
curve) at different times of day throughout the week. A period- 
2 cycle is observed for each day of the week. Maxima are seen 
in the morning (8-lOam) and evening (lOpm-midnight), and 
minima in the afternoon (2-4pm) and overnight (2-4am) hours. 
The peak in the morning is consistently higher than that in the 
evening, and the afternoon valley is consistently lower than the 
overnight valley. The cycle is somewhat less structured on the 
weekend. Also plotted are the probabilities of tweeting from 
locations other than the mode (red curve). While the shape 
is quite similar to the mode location probability, we do note 
that individuals tweeting at 2am are likely to be anywhere but 
home. 

In a study performed with cellphone tower data, Gonzalez 
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Figure 3: (Color online) The probability density function of observing an individual in their normalized reference frame, where 
the origin corresponds to each individual's expected location, and <3y = corresponds to their principle axis. This map shows 
the positions of over 37,000 individuals, each with more than 50 locations, in their intrinsic reference frame. 



o 




^ 


1 






A 




B4 




c 


■./. 


3 




o 




o 




o^ 








D) 


. ..•■;•'..• ■ 


32 


• .\.---'\" ■ ,•; 


1 
r\ 


^;->\- ' ■ ■■■>:■;;. ■■• 



-15 -9 



x/a 



15 



0.4 

0.35 

0.3 

xO.25 

I 

' 0.2 

0.15 

0.1 

0.05. 



-Slope=-0.071 




'01234 
Log-10 Radius of Gyration (l<nn) 



Figure 4: Looking at messages authored in the principle axis corridor, defined by | J-| < j^, we observe a clear separation 
between the most likely and second most likely position (A). The distribution is skewed left, with movement in a heading 
opposite an individual's work/home corridor observed to be highly unlikely. In addition, due to the normalization, we see that 
individuals are much more likely to tweet slightly east of their expected location than slightly west. The isotropy ratio (B) mea- 
sures the change in the density's shape as a function of gyradius, with large radius individuals exhibiting a less circular pattern 
of life. Standard errors are plotted, but are only visible for the largest radius group. The isotropy ratio decays logarithmically 
with radius. 



et al. [2] found that people spend most of their time in two lo- 
cations, and a person's probability of being found at a separate 
location diminishes rapidly with rank by visitation. While our 
investigation reveals a similar pattern, we find a larger differ- 
ence in the probability that an individual is tweeting from the 
home habitat than from the work habitat. We attribute these 
slight differences in our results to the different spatiotempo- 
ral precision of location data, as well as differences in activi- 
ties represented by the data. Gonzalez et al. determined each 
individual's location by continuously monitoring the nearest 
cellphone tower whose range they were within. As such, we 
receive more precise location information, but only when in- 
dividuals performed the act of tweeting. 

One major advantage of using Twitter data to study move- 
ment is the additional source of information provided by the 
messages themselves. Researchers using mobile phone data to 
characterize mobility patterns do not have access to conversa- 
tions occurring during the time period of interest. To measure 
the sentiment associated with different patterns of movement, 
we use the hedonometer introduced by Dodds et al. in [27]. 
The instrument performs a context-free measurement of the 
happiness of a large collection of words using the language as- 
sessment by Mechanical Turk (labMT) word list, as described 
in [27,30]. LabMT comprises roughly 10,000 of the most fre- 
quently used words in the English language, each of which 
was scored for happiness on a scale of 1 (sad) to 9 (happy) by 
people using Amazon's Mechanical Turk service [31,32], re- 
sulting in an average happiness score for each word. Example 
word scores are shown in Table 1 . 



To examine the relationship between movement and happi- 
ness, we calculate expressed happiness as a function of dis- 
tance from an individual's expected location, as well as gy- 
radius. For the former, we grouped tweets into ten equally 
populated bins, with each group containing more than 500,000 
tweets from similar distances. The happiness of each group 
was then computed using Eqn 3, where all words written from 
a given distance were gathered into a single bin. For the latter, 
we placed individuals into ten equally sized groups by gyra- 
dius, with each group containing more than 10,000 individuals 
with similar gyradii. 

Fig. 6 plots average word happiness against the distance 
from expected location (A), and gyradius (B). Starting with lo- 
cation, we find that tweets written close to an individual's cen- 
ter of mass are slightly happier than those written 1km away. 
The least happy words, on average, are used at a distance rep- 
resentative of a short daily commute to work. Beyond this 
least happy distance, remarkably we find that happiness in- 
creases logarithmically with distance from expected location. 
Perhaps even more remarkably, we find an almost identical 
trend when grouping together individuals rather than tweets, 
observing that happiness also increases logarithmically with 
gyradius. Individuals with a large radius use happier words 
than those with a smaller pattern of life. We find the trend ob- 
served in Fig. 6 holds for 3 of the 4 urban areas (Los Angeles, 
San Francisco, and Chicago), see Figs. S4, S5. 

To explain the difference in expressed happiness exhibited 
by different mobility groups, we turn to word shift graphs in 
Fig. 7. Word shift graphs were introduced by Dodds and 
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Figure 5: Representing the approximately 300 individuals for whom we have at least 800 geolocated messages, we plot the 
probability of tweeting from a habitat as a function of the tweet habitat rank (A). Each dot represents a single individual's 
likelihood of tweeting from one of their habitats. The axes are logarithmic, revealing an approximate Zipfian distribution with 
slope -1.3 [29]. (B) Distribution of the rank-1 habitat, each individual's mode location. (C) A robust diurnal cycle is observed 
in the hourly time of day at which statuses are updated, with those from the mode location (black curve) occurring more often 
than other locations (red curve) in the morning and evening. Probabilities sum to 1 for each curve, with bins for each hour 
Dashed vertical lines denote midnight. 




1 10 100 1000 

Distance (km) from Expected Location 



10 100 1000 

User Radius of Gyration (km) 



Figure 6: Tweets are grouped into ten equally populated bins by the distance from their author's expected location, and the 
average happiness of words written at each distance is plotted (A). Expressed happiness grows logarithmically with distance 
from expected location. A similar trend is observed when individuals are grouped into ten equally populated bins by their 
gyradius, and all words authored by individuals in each bin are gathered (B). These observed trends persist through variation in 
binning and different measures of mobility. 



Danforth [27, 33] as a means for investigating the elements of 
language responsible for happiness differences between two 
large texts. As an example, consider the difference between 
tweets authored at distances of roughly 1km and 2500km away 
from an individual's expected location. The average happiness 
scores for these two distances are h^vg — 5.96 and h^yg = 6.13 
respectively. Individual word contributions to this difference 
are shown in Fig. 7A, and can be described as follows. 

Words appearing on the right increase the happiness of the 
2500km distance relative 1km distance. For example, tweets 
authored far from an individual's expected location are more 
likely to contain the positive words 'beach', 'new', 'great', 
'park', 'restaurant', 'dinner', 'resort', 'coffee', 'lunch', 'cafe', 
and 'food', and less likely to contain the negative words 'no', 
'don't', 'not', 'hate', 'can't', 'damn', and 'never' than tweets 
posted close to home. Words going against the trend appear 
on the left, decreasing the happiness of the 2500km distance 
group relative to the 1km group. Tweets close to home are 
more likely to contain the positive words 'me', 'loF, 'love', 
'like', 'haha', 'my', 'you', and 'good'. Moving clockwise, the 
three insets in Fig. 7A show that the two text sizes are com- 
parable, the biggest contributor to the happiness difference is 
the decrease in negative words authored by individuals very 
far from their expected location, and the 50 words listed make 
up roughly 50% of the total difference between the two bags 
of words. 

Note that the relatively small differences in l\n.g scores re- 
flect a small signal, yet one that we have shown previously can 
be resolved by our hedonometer [27]. Additional word shift 
comparisons for the four urban areas investigated earlier are 
provided in the Supplemental Material, Figs. S6, S7. 



Looking at the word differences between individuals with 
largest and smallest radii of gyration in Fig. 7B, we see that 
individuals in the large radius group author the negative words 
'hate', 'damn', 'dont', 'mad', 'never', 'not' and assorted pro- 
fanity less frequently, and the positive words 'great', 'new', 
'dinner', 'hahaha', and 'lunch' more frequently than the small 
radius group. Going against the trend, the large radius group 
uses the positive words 'me', 'loF, 'love', 'like', 'funny', 
'girl', and 'my' less frequently, and the negative words 'no', 
and 'last' more frequently. Comparing with other groups, the 
large radius group authors an increased frequency of words in 
reference to eating, like the words 'dinner', 'lunch', 'restau- 
rant', and 'food', and make less reference to traffic conges- 
tion. 

Comparing the two figures, we note that individuals with 
large radius laugh more (e.g 'hahaha') than those with a small 
radius, but individuals closer to their expected location laugh 
more than those far from home. 

These word differences reveal the relationship between an 
individual's pattern of movement and their experiences. It is 
not surprising to observe regular international travelers tweet- 
ing about the food they enjoy on vacation. Indeed, we expect 
that individuals capable of tweeting at a great distance from 
their expected location are more likely to benefit from an ad- 
vantaged socioeconomic status, which they happily update fre- 
quently. Previous work has demonstrated that expressed hap- 
piness correlates strongly with many socioeconomic indica- 
tors [26]. Nevertheless, setting aside these luxurious words, 
we still see a general decline in the use of negative words as 
individuals travel farther from their expected location. In fact, 
of the four contributions to the difference in happiness between 
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Figure 7: (Color online) Word shift graphs comparing (A) the lo'west average word happiness distance from home group to 
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words authored close to home vs. far from home, this decline 
in negative words when far from home is the largest compo- 
nent (bottom right inset, Fig. 7). 
Discussion 

Using 37 million geolocated tweets authored in 2011, this 
study characterizes the pattern of life of over 180,000 individ- 
uals in the United States. While observed mobility patterns 
agree qualitatively with previous work investigating cellphone 
data [2], we are able to connect movement patterns to changes 
in word usage for the first time. Our main finding is that ex- 
pressed happiness increases logarithmically with both distance 
from expected location and gyradius, largely because individ- 
uals who travel farther use positive, food related words more 
frequently, and negative words and profanity less frequently. 

Several methodological issues are raised by the use of Twit- 
ter messages to characterize mobility and happiness. Consid- 
ering Twitter as a source, we note that according to the Pew 
Internet & American Life Project, roughly 15% of adults in the 
U.S. were actively using Twitter at the end of 201 1 [34]. While 
this fraction represents a substantial group of Americans, we 
have no data to quantify the demographic group represented 
by the subset of these 15% who specifically choose to geotag 
a large percentage of their messages. Nevertheless, since we 
threshold the sample to include individuals who have geolo- 
cated more than approximately 300 of their messages in 201 1, 
we suspect that the large majority of individuals represented in 
our study regularly do so as a matter of daily life, as opposed 
to geolocating messages only when encountering a novel ex- 
perience such as a vacation. 

Regarding word usage as a proxy for happiness, accessing 
the internal emotional state of individuals is beyond the scope 
of our instrument. We do believe however, that when aggre- 
gated, the words used by large groups of individuals reflect 
their culture in ways not captured by surveys or self-report. 
Indeed, we see the hedonometer as complementing more tra- 
ditional economic methods for characterizing economic and 
societal health, such as the Gross Domestic Product or Con- 
sumer Confidence Index. Using the same collection of geolo- 
cated messages explored here, the hedonometer was recently 
employed by Mitchell et al. [26] to characterize trends in word 
usage for cities. Expressed happiness was shown to correlate 
to hundreds of demographic, socio-economic, and health mea- 
sures, with interactive evidence available in the article's online 
Appendix [35]. 

This work contributes to a growing body of literature aimed 
at observing, describing, modeling, and ultimately explain- 
ing the spatiotemporal dynamics of large-scale socio-technical 
systems. Natural extensions of this work might combine topo- 
logical measures of network interactions with geospatial data 
to predict the likelihood of new links appearing in a social net- 
work [36], or to measure the spread of emotions through ge- 
ographical and topological space [37]. The mobility patterns 
described here could also be combined with more traditional 
surveys (e.g. census data) to inform public policy regarding 
many important issues, for example relating to the 'obesity 
epidemic' and changes in word usage at the level of individual 
neighborhoods targeted by public health campaigns. 



Methods 

In an effort at quality control for the geolocated messages, 
we identified and removed messages posted by robotic ac- 
counts and programmed tweeting services designed to au- 
tomatically send tweets typically not reflecting information 
about human activity. Preliminary analyses revealed a notice- 
able presence of bots posting geolocated messages referring to 
weather, earthquakes, traffic, and coupons. We identified and 
ignored tweets collected from individuals for whom at least 
half of their tweets contained any of the words 'pressure', 'hu- 
mid', 'humidity', 'earthquake', 'traffic' or 'coupon'. 

Messages referencing Foursquare check-ins (typically of the 
form 'I'm at Starbucks http://4sq.com/qrel9g') were retained 
for the purpose of characterizing the mobility profile of each 
individual. However, for results involving happiness, we ig- 
nored Foursquare check-in tweets as their content is unlikely 
to directly reflect sentiment. 

Finally, to ensure that individual movement profiles are 
based on a reasonably sized collection of locations, for this 
study we focus on individuals for whom we have at least 30 
geolocated tweets. Given the uniformity of the random sample 
provided by the gardenhose, we can assume these individuals 
geolocated a minimum of approximately 300 status updates in 
2011. 

For reasons of privacy, we ignored all user specific in- 
formation including individual names. In addition, where 
the trajectories traced out by specific individuals are vi- 
sualized, we obscured the coordinate system of reference. 
Tweets were assigned to urban areas as defined by the 2010 
United States Census Bureaus MAF/TIGER (Master Address 
File/Topologically Integrated Geographic Encoding and Ref- 
erencing) database [38]. 

The gyradius for individual a is defined as 



\ 



1 



7=1 



(/«)))2 



(1) 



where the two-dimensional vector p^ is the /th position in 
the trajectory of individual a, given by the geolocation of that 
individual's /th tweet, as observed in our database. A^^"' is 
the total number of tweets from individual a, and (/?'■"') = 

1 /N^"' Y!i=i Pi is the center of mass of their trajectory, which 
we denote their expected location. Note that if we consider 
each message to be a prediction of an individual's location, 
then the gyradius is in fact the root mean square error (RMSE) 
of that prediction. Fig. S8 plots the Complementary Cumu- 
lative Distribution Function (CCDF) of the gyradii of all indi- 
viduals. 

To compare the shape of individual trajectories, we normal- 
ize for both differences in gyradius and direction of trajec- 
tory. Considering each individual's trajectory as a set of [x^y)- 
pairs {{x\^y\), {x2,y2), . . . , (jcwj^w)}, we calculate the two di- 
mensional matrix known as the tensor of inertia, considering 
each point in a individual's trajectory as an equally weighted 
mass at location (x,,^,). We then find this tensor's eigenvec- 
tors and eigenvalues. The eigenvector corresponding to the 
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largest eigenvalue represents the axis along which most of the 
individual's trajectory occurs (hereafter called the individual's 
principal axis). Previous work has demonstrated that for most 
individuals, this axis is parallel to the corridor between their 
work location and home [2, 4] . 

To normalize the different compass orientations of individ- 
ual trajectories, we rotate the coordinate system of each in- 
dividual so that their principal axis points due west. The ex- 
pected location for each individual {x,y) is then used to trans- 
late their position vector, i.e. (x, —x,yi —y), to ensure that the 
shape of each individual's trajectory is in a common frame of 
reference. However, the distances travelled by each individual 
vary widely despite their shared orientation (e.g. pedestrian 
vs. airline commute). In order to compare these trajectories, 
we calculate the standard deviation a^, O, for a given individ- 
ual's trajectory, and divide their x- and y-coordinates by o^ and 
Gy, respectively. For more information about this process, in- 
cluding a pair of example trajectory normalizations, see Figs. 
S9-S13. 

In an attempt to characterize time spent in each location, we 

define the /th tweet habitat for individual a, denoted //; , to 
be a circle within which individual a posted at least 10 mes- 
sages [8]. The center of the circle is defined by the average 
position of all messages appearing in the habitat, and the ra- RcfCFCIlCCS 
dius of the circle is chosen such that each tweet posted within 
a habitat is at most 100 meters away from the center, and no 
habitats overlap. To measure the importance of habitat / to 
individual a, we count the number of messages appearing in 



word 


havgiwi) 


'happy' 


8.30 


'hahaha' 


7.94 


'fresh' 


7.26 


'cheiTy' 


7.04 


'pancake' 


6.96 


'piano' 


6.94 


'and' 


5.22 


'the' 


4.98 


'of 


4.94 


'down' 


3.66 


'worse' 


2.70 


'crash' 


2.60 


':(' 


2.36 


'wai-' 


1.80 


'jail' 


1.76 



Table 1: Example language assessment by Mechanical Turk 
(labMT) [27, 30] words and scores. Words with neutral scores 
4 < havg{wi) < 6 are colored gray and ignored when assigning 
the happiness score to a large text. 
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where \Hj \ is the number of tweet locations contained in 

Hj . Notice that the habitat probabilities for individual a may 
not sum to one since it may be the case that individual a has 
tweet locations that are not contained in a tweet habitat. Here- 
after, we will refer to an individual's most frequently visited, 
or rank-1 habitat, as their mode location. 

Using the labMT scores [27], we determine the average hap- 
piness (havg) of a given text T containing A^ unique words by 



havg{T) 



Y!^=\Kvg{wi 



. f. N 
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where /, is the frequency with which the /th word w,, for 
which we have an average word happiness score havg{wi), 
occurred in text T. The normalized frequency of w, is then 
givenhy Pi =fi/'£f^ifi. 

The hedonometer instrument can be tuned to emphasize the 
most emotionally charged words by removing words within 
Ahavg of the neutral score of havg = 5. It has been shown that 
ignoring these neutral words with 4 < havg{wi) < 6 provides a 
good balance of sensitivity and robustness, and thus we chose 
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Figure Captions 



Figure 1: Each point corresponds to a geolocated tweet posted 
in 2011. Twitter activity is most apparent in urban areas. 
Note that the image contains no cartographic borders, simply a 
small dot for each message. Legend: A (U.S.), B (Washington, 
D.C.), C (Los Angeles, C.A.), and D (Earth). 



Figure 2: (Color online) The gyradius, calculated for each in- 
dividual, is shown for each tweet authored in four example 
cities. Reflecting the pattern of urban life, we find messages 
authored by large radius individuals to be more likely to ap- 
pear in the main downtown area of each city, while messages 
authored by small radius individuals tend to appear outside of 
the main urban area. Histograms of gyradii for each city are 
shown in Fig. SI, along with tweet locations colored by dis- 
tance from expected location (Fig. S2). 



Figure 5: Representing the approximately 300 individuals for 
whom we have at least 800 geolocated messages, we plot the 
probability of tweeting from a habitat as a function of the 
tweet habitat rank (A). Each dot represents a single individ- 
ual's likelihood of tweeting from one of their habitats. The 
axes are logarithmic, revealing an approximate Zipfian dis- 
tribution with slope -1.3 [29]. (B) Distribution of the rank-1 
habitat, each individual's mode location. (C) A robust diurnal 
cycle is observed in the hourly time of day at which statuses 
are updated, with those from the mode location (black curve) 
occurring more often than other locations (red curve) in the 
morning and evening. Probabilities sum to 1 for each curve, 
with bins for each hour Dashed vertical lines denote midnight. 



Figure 3: (Color online) The probability density function of 
observing an individual in their normalized reference frame, 
where the origin corresponds to each individual's expected lo- 
cation, and Oy = corresponds to their principle axis. This 
map shows the positions of over 37,000 individuals, each with 
more than 50 locations, in their intrinsic reference frame. 



Figure 4: Looking at messages authored in the principle axis 



corridor, defined by 



< 



30 
1000 



, we observe a clear separation 



between the most likely and second most likely position (A). 
The distribution is skewed left, with movement in a heading 
opposite an individual's work/home corridor observed to be 
highly unlikely. In addition, due to the normalization, we see 
that individuals are much more likely to tweet slightly east of 
their expected location than slightly west. The isotropy ratio 
(B) measures the change in the density's shape as a function 
of gyradius, with large radius individuals exhibiting a less cir- 
cular pattern of life. Standard errors are plotted, but are only 
visible for the largest radius group. The isotropy ratio decays 
logarithmically with radius. 



Figure 6: Tweets are grouped into ten equally populated bins 
by the distance from their author's expected location, and the 
average happiness of words written at each distance is plotted 
(A). Expressed happiness grows logarithmically with distance 
from expected location. A similar trend is observed when in- 
dividuals are grouped into ten equally populated bins by their 
gyradius, and all words authored by individuals in each bin are 
gathered (B). These observed trends persist through variation 
in binning and different measures of mobility. 



Figure 7: (Color online) Word shift graphs comparing (A) the 
lowest average word happiness distance from home group to 
the words authored farthest from home, which also has the 
largest average word happiness and (B) the smallest gyradius 
group with the largest gyradius group. The words in the word 
shifts from top to bottom appear in decreasing order of ranked 
percentage contribution to the overall average happiness dif- 
ference (Ahavg) of the two texts being compared. The +/- sym- 
bols indicate whether the word has an average happiness score 
that is happy or sad relative to the entire text Tyef- The sym- 
bols t / 4- indicate whether a word was used more or less in 
Tcomp relative to usage in r^/. The left inset panel shows how 
the ranked top contributing words to Ahavg combine in sum. 
The four circles in the lower right show the total contribution 
of the four word types (+ f, — tj + ii ~ i)- Relative text size 
is represented by the grey squares. See [27] for further details 
and examples of word shift graphs. 
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Figure S 1 : The distributions of gyradius (km) for four cities 
appear to be log-normal. The mode distance (binned) is larger 
for Los Angeles and San Francisco than for Chicago and New 
York City. We note that these distributions were calculated for 
all individuals whose expected location fell within the latitude 
and longitude bounds of main text Fig. 2, and thus reflect a 
modified set of individuals than those identified with cities in 
Fig. S3. 
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Figure S2: (Color online) The distance from expected location, calculated for each individual, is shown for each tweet authored 
in four example cities in 201 1 . Messages authored far from this location are more likely to appear in the main downtown area of 
each city (tourists and long-distance commuters), while messages authored close to the expected location tend to appear outside 
of the main urban area. 
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Figure S3: (Color online) The mean gyradius of individu- 
als whose expected location falls within each city is plotted 
against the city's population (A) and land area (B). Shown are 
cities containing at least 50 individuals with a nonzero gyra- 
dius, each individual having authored at least 30 geolocated 
tweets. City boundaries are defined by [38] which encom- 
passes a smaller area for the four cities illustrated in the main 
text Fig. 2. Generally, gyradius increases with city popula- 
tion and land area, with no large cities exhibiting a small mean 
radius. Pearson coiTelations: Population p = 0.10, p = 0.03, 
Land Area p = 0.24, p = 2 x 10"'^. 

In Fig. S3, we plot the mean gyradius vs land area for each 
city in the U.S. as defined by [38]. While there is considerable 
scatter among the cities, we do observe a weak correlation in- 
dicating that individuals living in more populated and larger 
areas travel farther. Table S 1 shows the top and bottom cities 
with respect to mean gyradius, as well as the four cities dis- 
cussed above. 



rank 


radius (km) 


city 


1 


200.6 


Martinsville, VA 


2 


124.5 


Middletown, OH 


3 


112.3 


Elkhart, IN 


4 


98.8 


Pottstown, PA 


5 


96.6 


Decatur, IL 


215 


13.3 


New York City, NY 


247 


11.4 


Chicago, IL 


300 


8.94 


Los Angeles, CA 


387 


4.33 


San Francisco & Oakland, CA 


468 


0.492 


Greenville, MS 


469 


0.491 


Athens, OH 


470 


0.465 


Key West, FL 


471 


0.381 


El Centro Calexico, CA 


472 


0.312 


Pullman, WA 



Table SI: Top and Bottom 5 cities with respect to mean gy- 
radius, along with the four cities investigated in main text Fig 
2. 
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Figure S4: For New York City, Los Angeles, Chicago, and the San Francisco Bay Area, we group messages into equally sized 
bins by the distance from expected location of their author, and measure the average word happiness of each group. These plots 
exhibit similar trends to that observed in main text Fig. 6A with the exception of New York City. 
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Figure S5: For New York City (130 individuals/bin), Los Angeles (175 individuals/bin), Chicago (125 individuals/bin), and 
the San Francisco Bay Area (63 individuals/bin), we group individuals into equally sized bins by their gyradius and measure 
the average word happiness of each group. These plots exhibit similar trends to that observed in main text Fig. 6B with the 
exception of the largest radius group in the San Francisco Bay Area, and New York City as a whole. 
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Figure S6: (Color online) We compare the 6.55 km gyradius group versus the 292.03 km gyradius group (Left). We find that 
the 292.03 km group has relatively frequent use of the words 'car' and 'weekend' suggesting that this group travels on the 
weekends perhaps to a vacation home as suggested by use of the word 'home'. (Right) We compare the 13.26 km gyradius 
group versus the 292.03 km gyradius group. We find that the 292.03 km group uses the word 'car' more frequently than the 
13.26 km group which, interestingly, uses the word 'traffic' more frequently. Again the increased relative usage of these words 
seems fitting for a groups with these patterns of movement. 
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Figure S7: (Color online) (Left) A word shift comparing the 4.79 km gyradius group to the 223 km gyradius for Chicago. 
We observe the first group is less happy because of increased usage of profanity and negative words like 'can't', 'gone', and 
'wrong'. (Right) A word shift comparing the .87 km gyradius group to the 123.54 km gyradius group for the San Francisco Bay 
Area. We find the second group to be happier because of an increase in positive words like 'haha', 'win', 'weekend', 'funny', 
and 'loF, along with a decrease in negative words like 'no', 'problem', and 'hate'. 
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Figure S8: Complementary Cumulative Distribution Function (CCDF) for the gyradii of all users with at least 30 geotagged 
messages. Gonzalez [2] found this distribution to be well modeled by a truncated power law, exponential tail. 
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Normalizing Human Trajectory 

To compare the shape of trajectories of individuals traveling 
in different directions and over different distances, we use the 
methods introduced by Gonzalez et al. [2]. We will examine 
the normalization steps for two individuals we will call user 
A and user B. We have 768 geolocated tweets for user A 
and 1,882 geolocated tweets for user B. User A has gyradius 
r^ = 463.61 km and user B has gyradius r^ — 54.28 km. Fig. 
S9 represents the geospatial tweet locations for user A and 
user B, but we shifted their coordinate system to maintain 
their anonymity. We have also allowed for a slight spatial 
separation between the locations for user A and the locations 
of user B for clarity. 
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Figure S9: (Color online) Tweet locations for User A and User 
B. 

In Fig. SIO, we apply the linear transformation shifting 
each location for the user to the distance in kilometers from 
their center of mass, i.e. the expected location of the user 
The difference in gyradius between user A and user B is still 
very apparent in the axis ranges for this plot. Notice that the 
directional relationships between the tweet locations for each 
user have still been preserved. We can see that user A travels 
predominantly in a southwest direction, while user B travels 
primarily in a northwest direction. 

To normalize for direction of travel, let the set of 
tweet locations for user / be represented by the set of 
equally weighted masses at each of the tweet locations 
{{xi,yi),{x2,y2),---,{x„,y„)}. Now we calculate the tensor 
of inertia (/) for each set of weighted (x,y) -points as 
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The eigenvector of / corresponding to the largest eigenvalue of 
/ represents the direction along which most of user /'s trajec- 
tory occurs; we call this the principal axis for user / (see Fig. 
Sll). 
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Figure SIO: (Color online) Tweet locations for User A and 
User B transformed to the distance in kilometers from their 
expected locations, respectively. 
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Figure Sll: (Color online) The tweet locations for User A and 
User B along with a line representing the principal axis for that 
user. 
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Now we can determine the angle necessary to rotate the set of thek normaUzed tweet locations in two main clusters. 
of points for user / so that the the resulting principal axis is the 
jc-axis. Fig. S12 shows the results of this step. We see that the 
principal axis for user A and user B is now the x-axis. 
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Figure S12: (Color online) The results after rotating the loca- 
tions of User A and User B. We see that they now both have 
principal axes of trajectory pointing due west. 

The final step is to normalize for individuals with differ- 
ent gyradius. We accomplish this by dividing the x-coordinate 
of each rotated tweet location for user / by a.v, where Ox is 
the standard deviation of the x-coordinates of the rotated tweet 
locations for user /, and similarly dividing by a,, for the y- 
coordinates. The final result is shown in Fig. S13. 
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Figure SI 3: (Color online) The rotated tweet locations of User 
A and User B after normalizing for gyradius. The origin rep- 
resents the center of mass of the respective individuals' trajec- 
tory, namely {p") from equation (2). 

As a result, we can compare the shape of the trajectories 
for User A and User B having normalized for direction and 
gyradius. We can see that both User A and User B have most 
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